A complete and efficient CUDA-sharing solution for HPC clusters
نویسندگان
چکیده
منابع مشابه
A complete and efficient CUDA-sharing solution for HPC clusters
In this paper we detail the key features, architectural design, and implementation of rCUDA, an advanced framework to enable remote and transparent GPGPU acceleration in HPC clusters. rCUDA allows decoupling GPUs from nodes, forming pools of shared accelerators, which brings enhanced flexibility to cluster configurations. This opens the door to configurations with fewer accelerators than nodes,...
متن کاملCUDA-For-Clusters: A System for Efficient Execution of CUDA Kernels on Multi-core Clusters
Rapid advancements in multi-core processor architectures along with low-cost, low-latency, high-bandwidth interconnects have made clusters of multi-core machines a common computing resource. Unfortunately, writing good parallel programs to efficiently utilize all the resources in such a cluster is still a major challenge. Programmers have to manually deal with low-level details that should idea...
متن کاملEfficient and Scalable Operating System Provisioning for HPC Clusters
Operating system provisioning is a common and critical task in cluster computing environments. The required low-level operations involved in provisioning can drastically decrease the performance of a given solution, and maintaining a reasonable provisioning time on clusters of 1000+ nodes is a significant challenge. We present Kadeploy3, a tool built to efficiently and reliably deploy a large n...
متن کاملProductivity in HPC Clusters
This presentation discusses HPC productivity in terms of: (1) effective architectures, (2) parallel programming models, and (3) applications development tools. The demands placed on HPC by owners and users of systems ranging from public research laboratories to private scientific and engineering companies enrich the topic with many competing technologies and approaches. Rather than expecting to...
متن کاملEfficient CUDA
Recent work in deep learning has created a voracious demand for more compute cycles, but with the slowdown of Moore’s law, CPUs cannot keep up with the demand. Thus, attention has turned to massively-parallel hardware accelerators, ranging from new processing units designed specifically for deep learning to the repurposing of existing technologies like Fieldprogrammable gate arrays (FPGAs) and ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Parallel Computing
سال: 2014
ISSN: 0167-8191
DOI: 10.1016/j.parco.2014.09.011